1 Introduction

The South Sudan Monitoring and Evaluation Support Project (MESP) is conducting a household resilience survey in 13 counties. The purpose of this household survey is to obtain baseline data in the target areas for the indicators included in the Mission’s Performance Management Plan (PMP) and the Community Roadmap, in support of USAID/South Sudan’s Strategic Framework (2020-2024).

MESP has completed data collection in six counties, and as of October 2021 is carrying out data collection in the remaining seven counties. The study team is currently reviewing the data from six counties in advance of analysis of the full set of 13 counties. This document reviews a selection of measures and indices relating to resilience.

2 Resilience measures

The United States Agency for International Development (USAID) defines resilience as the ability of people, households, communities, countries, and systems to mitigate, adapt to and recover from shocks and stresses in a manner that reduces chronic vulnerability and facilitates inclusive growth (Measuring Resilience in USAID). For USAID-funded activities, resilience is operationalized as a set of capacities that mediate the interaction of shocks and stresses on a measure of household well-being (Tango International 2018).

Common well-being measures include poverty, food insecurity, child wasting, and self-perceived ability to recover from shocks or stresses. Shocks and stresses are measured across a range of specific events (shocks) or risks (stresses) and aggregated to a single measure of exposure. Capacities are measured along three primary indices, with each dimension consisting of its own set of sub-indices:

  • Absorptive capacity
    • Access to informal safety nets
    • Bonding social capital
    • Access to cash savings
    • Access to remittances
    • Asset ownership
    • Shock preparedness and mitigation
    • Access to insurance
    • Access to humanitarian/development assistance
  • Adaptive capacity
    • Bridging social capital
    • Linking social capital
    • Social network index
    • Education/training
    • Livelihood diversification
    • Exposure to information
    • Adoption of improved practices
    • Asset ownership
    • Availability of financial services
    • Aspirations/confidence to adapt/locus of control
  • Transformative capacity
    • Access to formal safety nets
    • Access to markets
    • Access to communal natural resources
    • Access to basic services
    • Access to infrastructure
    • Access to livestock services
    • Bridging social capital
    • Linking social capital
    • Collective action
    • Social cohesion
    • Gender equitable decision-making index
    • Participation in local decision-making
    • Local government responsiveness
    • Gender index

The Food and Agricultural Organization (FAO) also has a methodology for measuring resilience (FAO 2016), which adds measures such as dietary diversity as a complement to food insecurity.

The South Sudan resilience study reviewed this list of outcomes, shocks, and capacities and selected a subset of them for measuring resilience:

  • Aspirations index
  • Dietary diversity
  • Bonding social capital
  • Bridging social capital
  • Food insecurity
  • Shocks and stresses
  • Belief in local government support
  • Social norms

This document will conduct preliminary review of these measures, using household survey data in six counties.

3 Resilience data

The data comes from a household survey in the counties of Akobo, Budi, Duk, Leer, Pibor, and Uror, using a two-stage sampling procedure where the sampling points, referred to as enumeration areas (EAs), are randomly selected within each county, and households are randomly selected within each EA. The following map illustrates the sampling points across all six counties.

include_graphics(here("exploratory analysis/viz/maps/MESP_HH_Survey_Sampling_Point_ALL.jpg"))

A total sample size of 3,563 households were apportioned relatively evenly across the six counties.

smpl <- svyrdat %>%
  group_by(County=county) %>%
  summarize(`Population`=survey_total(),
            `Sample`=unweighted(n())) %>%
  select(1,2,4) 

smpl_gt <- smpl %>%
  gt() %>%
  fmt_number(2:3,
             decimals=0)

smpl_gt
County Population Sample
Akobo 11,434 625
Budi 13,071 620
Duk 6,275 518
Leer 5,716 620
Pibor 36,526 564
Uror 39,354 616

Under simple random sampling, and using the measure of belief in local government support in the face of shocks and stresses, the overall margin of error of the survey would be 1.9 percent, per the following table.

dat %>%
  summarize(mn = mean(resil_c, na.rm=T),
            std.err = std.error(resil_c, na.rm=T),
            margin = 1.96*std.err)
mnstd.errmargin
0.4090.009670.0189

Given the clustered nature of the data collection, the complex margin of error overall is 5.9 percent, per the following table:

svymean(~resil_c,
        na.rm=T,
        deff="replace",
        design=svydat) %>%
  as.data.frame() %>%
  mutate(Margin = 1.96*resil_c,
         deft=sqrt(deff))
meanresil_cdeffMargindeft
0.3510.030210.30.05923.22

Across county, the simple margin of error ranges from 3.5 - 5.5 percent, per the following table:

dat %>%
  group_by(county) %>%
  summarize(mn=mean(resil_c, na.rm=T),
            se=std.error(resil_c, na.rm=T),
            margin=1.96*se)
countymnsemargin
Akobo0.3080.02670.0524
Budi0.6220.02030.0398
Duk0.2620.01970.0386
Leer0.6250.02380.0466
Pibor0.1980.01780.035 
Uror0.4010.02820.0554

Accounting for the clustered nature of the data collection, the complex margin of error across county ranges from 3.6 - 19.1 percent, per the following table:

svyrdat %>%
  group_by(county) %>%
  summarize(mn=survey_mean(resil_c, na.rm=T)) %>%
  mutate(margin=1.96*mn_se)
countymnmn_semargin
Akobo0.2930.04310.0845
Budi0.5  0.09540.187 
Duk0.3640.09770.191 
Leer0.63 0.04650.0912
Pibor0.2020.01850.0363
Uror0.4550.08620.169 

In the exploratory analysis that follows, both weighted or unweighted frequencies may be used. Generally speaking, when reporting an overall measure such as a mean or frequency, weighted measures should be considered as the estimates of their respective population parameters. In other cases where interest may be more in how the measures vary across disaggregations of interest, unweighted measures may be used.

4 Aspirations index

Following (Tango International 2018), the Aspirations index comprises six binary choice survey items, in which respondents either affirm or deny statements relating to their sense of agency in life, or whether the household has concrete plans for the future. The following table shows these items and their overall weighted frequencies.

asp <- svyrdat %>%
  group_by() %>%
  summarize(asp1=mean(asp1, na.rm=T),
            asp2=mean(asp2, na.rm=T),
            asp3=mean(asp3, na.rm=T),
            asp4=mean(asp4, na.rm=T),
            asp5=mean(asp5, na.rm=T),
            asp6=mean(asp6, na.rm=T)) %>%
  t() %>%
  as.data.frame() %>%
  rownames_to_column("var_name") %>%
  mutate(item_num = c("q_629", "q_630", "q_634", "q_635", "q_632", "q_633"), 
         Description=c("Each person is responsible for his/her own success or failure in life",
                       "To be successful one needs to work very hard rather than rely on luck",
                        "My experience in life has been that was is going to happen will happen (reverse-coded)",
                        "It is not always good for me to plan too far ahead because many things turn out to be a matter of good or bad fortune (reverse-coded)",
                        "Hopeful for children's future",
                        "Desire for their children to graduate from secondary school")) %>%
   select(var_name, item_num, Description, Percent=V1)

asp

var_nameitem_numDescriptionPercent
asp1q_629Each person is responsible for his/her own success or failure in life0.781
asp2q_630To be successful one needs to work very hard rather than rely on luck0.797
asp3q_634My experience in life has been that was is going to happen will happen (reverse-coded)0.342
asp4q_635It is not always good for me to plan too far ahead because many things turn out to be a matter of good or bad fortune (reverse-coded)0.368
asp5q_632Hopeful for children's future0.839
asp6q_633Desire for their children to graduate from secondary school0.916
Note the low values for the reverse coded items, suggesting some combination of acquiescence bias (uncritically affirming the implicit premise of the question) or lack of comprehension of a reverse-coded question in which the respondent must disagree with the implicit premise in order to communicate their stance. These reverse-coded items should be considered more critically for inclusion in any index computation.

A simple summation of the six binary items provides an ordered scale ranging from 0-6.

The simple mean and other descriptives

describe(dat$aspirations_index2)
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
13.45e+034.041.1644.071.48066-0.254-0.08710.0197

The overall weighted value is 3.9 out of six, with a margin of 0.12. The median is four.

svyrdat %>%
  summarize(Aspirations=survey_mean(aspirations_index2, na.rm=T),
            Aspirations_med = survey_quantile(aspirations_index2, .5, na.rm=T)) %>%
  mutate(margin=1.96*Aspirations_se)

AspirationsAspirations_seAspirations_med_q50Aspirations_med_q50_semargin
3.90.0591400.116
The distribution of the weighted scale values.

ggplot(dat, aes(aspirations_index2)) + 
  geom_bar(width=.2, fill="dodgerblue2") + 
  geom_vline(xintercept=3.9, size=1, color="darkgoldenrod2") + 
  geom_vline(xintercept=4, size=1, color="turquoise") +
  scale_x_continuous(breaks=0:6)

Unweighted values by county.

asp_cnty <- dat %>%
  group_by(county) %>%
  summarize(asp = mean(aspirations_index2, na.rm=T),
            se = std.error(aspirations_index2)) %>%
  mutate(lower=asp-1.96*se,
         upper=asp+1.96*se)

asp_cnty
countyaspselowerupper
Akobo4   0.04753.9 4.09
Budi4.420.055 4.324.53
Duk3.950.03423.894.02
Leer4.380.04544.3 4.47
Pibor3.540.03493.473.61
Uror3.890.05233.783.99
ggplot(asp_cnty, aes(asp, fct_reorder(county, asp))) + 
  geom_vline(xintercept=3.9, size=1, alpha=.7, color="darkgoldenrod2") +
  geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") + 
  geom_point(color="dodgerblue", size=3) + 
  geom_errorbar(aes(xmin=lower, xmax=upper),
                color="dodgerblue",
                width=0,
                size=1.1) +
  scale_x_continuous(limits=c(0,6),
                     breaks=0:6) +
  labs(x="Aspirations index",
       y="")

Weighted values by county

asp_cnty_wt <- svyrdat %>%
  group_by(county) %>%
  summarize(asp = survey_mean(aspirations_index2, na.rm=T)) %>%
  mutate(lower=asp-1.96*asp_se,
         upper=asp+1.96*asp_se)

asp_cnty_wt
countyaspasp_selowerupper
Akobo4.040.104 3.844.24
Budi4.560.116 4.334.79
Duk3.990.06723.864.12
Leer4.410.05574.3 4.52
Pibor3.540.07263.4 3.69
Uror3.870.136 3.6 4.14
ggplot(asp_cnty_wt, aes(asp, fct_reorder(county, asp))) + 
  geom_vline(xintercept=3.9, size=1, alpha=.7, color="darkgoldenrod2") +
  geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") + 
  geom_point(color="dodgerblue", size=3) + 
  geom_errorbar(aes(xmin=lower, xmax=upper),
                color="dodgerblue",
                width=0,
                size=1) +
  scale_x_continuous(limits=c(0,6),
                     breaks=0:6) +
  labs(x="Aspirations index",
       y="")

4.1 Aspirations index - factor analysis construction

Assess suitability of measures for index construction using factor analysis

dat %>%
  select(asp1:asp6) %>%
  fa.parallel(cor="tet")

## Parallel analysis suggests that the number of factors =  3  and the number of components =  3

Three factors or components out of six items is not a good sign.

asp_alpha <- dat %>%
  select(asp1:asp6) %>%
  alpha()
## Some items ( asp3 asp4 ) were negatively correlated with the total scale and 
## probably should be reversed.  
## To do this, run the function again with the 'check.keys=TRUE' option
asp_alpha
## 
## Reliability analysis   
## Call: alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r  S/N   ase mean   sd median_r
##       0.29      0.29    0.31     0.063 0.41 0.018 0.67 0.19    0.064
## 
##  lower alpha upper     95% confidence boundaries
## 0.26 0.29 0.33 
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r  med.r
## asp1      0.27      0.25    0.27     0.064 0.34    0.019 0.0210  0.028
## asp2      0.22      0.20    0.24     0.048 0.25    0.020 0.0222 -0.011
## asp3      0.27      0.29    0.27     0.075 0.40    0.019 0.0088  0.071
## asp4      0.19      0.22    0.22     0.055 0.29    0.022 0.0108  0.037
## asp5      0.29      0.26    0.27     0.065 0.35    0.018 0.0183  0.033
## asp6      0.29      0.28    0.30     0.073 0.40    0.019 0.0217  0.068
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## asp1 3541  0.46  0.47  0.24   0.12 0.78 0.41
## asp2 3540  0.50  0.52  0.34   0.18 0.80 0.40
## asp3 3540  0.51  0.43  0.23   0.12 0.34 0.47
## asp4 3541  0.59  0.50  0.36   0.21 0.37 0.48
## asp5 3531  0.40  0.46  0.24   0.08 0.84 0.37
## asp6 3477  0.31  0.43  0.16   0.07 0.92 0.28
## 
## Non missing response frequency for each item
##         0    1 miss
## asp1 0.22 0.78 0.01
## asp2 0.20 0.80 0.01
## asp3 0.66 0.34 0.01
## asp4 0.63 0.37 0.01
## asp5 0.16 0.84 0.01
## asp6 0.08 0.92 0.02

A reliability (alpha) value of .29 is unacceptable for index formation. This will also exhibit itself in the loadings of an exploratory factor analysis (EFA).

asp_fa <- dat %>%
  select(asp1:asp6) %>%
  fa(cor="tet")
asp_fa
## Factor Analysis using method =  minres
## Call: fa(r = ., cor = "tet")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        MR1    h2   u2 com
## asp1  0.31 0.095 0.90   1
## asp2  0.37 0.134 0.87   1
## asp3 -0.28 0.077 0.92   1
## asp4 -0.21 0.044 0.96   1
## asp5  0.79 0.620 0.38   1
## asp6  0.43 0.184 0.82   1
## 
##                 MR1
## SS loadings    1.15
## Proportion Var 0.19
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  15  and the objective function was  0.99 with Chi Square of  3511
## The degrees of freedom for the model are 9  and the objective function was  0.62 
## 
## The root mean square of the residuals (RMSR) is  0.17 
## The df corrected root mean square of the residuals is  0.22 
## 
## The harmonic number of observations is  3515 with the empirical chi square  3142  with prob <  0 
## The total number of observations was  3563  with Likelihood Chi Square =  2210  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  -0.049
## RMSEA index =  0.262  and the 90 % confidence intervals are  0.253 0.271
## BIC =  2136
## Fit based upon off diagonal values = 0.5
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.83
## Multiple R square of scores with factors          0.69
## Minimum correlation of possible factor scores     0.39

Only two of the six items have loadings exceeding 0.4, while the reverse-coded items have negative loadings. Let’s try again without the reverse-coded items.

dat %>%
  select(asp1:asp2,
         asp5:asp6) %>%
  fa.parallel(cor="tet")

## Parallel analysis suggests that the number of factors =  2  and the number of components =  2
asp2_alpha
## Error in eval(expr, envir, enclos): object 'asp2_alpha' not found

Two factors or components out of four items is still not a good sign.

asp2_alpha <- dat %>%
  select(asp1:asp2,
         asp5:asp6) %>%
  alpha()

asp2_alpha
## 
## Reliability analysis   
## Call: alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r  S/N   ase mean   sd median_r
##       0.36      0.36    0.31      0.12 0.55 0.017 0.83 0.22     0.13
## 
##  lower alpha upper     95% confidence boundaries
## 0.33 0.36 0.39 
## 
##  Reliability if an item is dropped:
##      raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## asp1      0.31      0.32    0.24     0.136 0.47    0.020 0.0038 0.142
## asp2      0.25      0.26    0.20     0.106 0.36    0.021 0.0094 0.122
## asp5      0.25      0.23    0.18     0.091 0.30    0.021 0.0098 0.072
## asp6      0.35      0.35    0.27     0.154 0.55    0.019 0.0015 0.142
## 
##  Item statistics 
##         n raw.r std.r r.cor r.drop mean   sd
## asp1 3541  0.62  0.57  0.29   0.18 0.78 0.41
## asp2 3540  0.64  0.60  0.36   0.23 0.80 0.40
## asp5 3531  0.62  0.62  0.40   0.23 0.84 0.37
## asp6 3477  0.44  0.54  0.24   0.13 0.92 0.28
## 
## Non missing response frequency for each item
##         0    1 miss
## asp1 0.22 0.78 0.01
## asp2 0.20 0.80 0.01
## asp5 0.16 0.84 0.01
## asp6 0.08 0.92 0.02

A reliability score of .36 is still unacceptable. Let’s look at the EFA loadings.

asp2_fa <- dat %>%
  select(asp1:asp2,
         asp5:asp6) %>%
  fa(cor="tet")
asp2_fa
## Factor Analysis using method =  minres
## Call: fa(r = ., cor = "tet")
## Standardized loadings (pattern matrix) based upon correlation matrix
##       MR1   h2   u2 com
## asp1 0.36 0.13 0.87   1
## asp2 0.47 0.22 0.78   1
## asp5 0.73 0.53 0.47   1
## asp6 0.43 0.19 0.81   1
## 
##                 MR1
## SS loadings    1.07
## Proportion Var 0.27
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  6  and the objective function was  0.44 with Chi Square of  1564
## The degrees of freedom for the model are 2  and the objective function was  0.12 
## 
## The root mean square of the residuals (RMSR) is  0.11 
## The df corrected root mean square of the residuals is  0.19 
## 
## The harmonic number of observations is  3506 with the empirical chi square  505  with prob <  2.3e-110 
## The total number of observations was  3563  with Likelihood Chi Square =  414  with prob <  1.2e-90 
## 
## Tucker Lewis Index of factoring reliability =  0.206
## RMSEA index =  0.24  and the 90 % confidence intervals are  0.221 0.26
## BIC =  398
## Fit based upon off diagonal values = 0.84
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.80
## Multiple R square of scores with factors          0.64
## Minimum correlation of possible factor scores     0.28

The loadings aren’t terrible, but measures of fit such as Root Mean Square Error of Approximation (RMSEA) and Tucker Lewis Index (TLI) indicate that the factor model should not be used.

Let’s include measures from the locus of control, which is a sub-index of the aspirations index. These items are attitudinal statements:

  • I can mostly determine what will happen in my life
  • When I get what I want, it is usually because I worked hard for it
  • My life is determined by my own actions
dat %>%
  select(asp1:asp2,
         asp5:asp6,
         q_636:q_638) %>%
  fa.parallel(cor="mixed")

## Parallel analysis suggests that the number of factors =  3  and the number of components =  3

Three factors or components out of seven items, still not a good sign.

asp3_alpha <- dat %>%
  select(asp1:asp2,
         asp5:asp6,
         q_636:q_638) %>%
  alpha()

asp3_alpha
## 
## Reliability analysis   
## Call: alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
##       0.55       0.5    0.52      0.13   1 0.0081  2.4 0.52    0.086
## 
##  lower alpha upper     95% confidence boundaries
## 0.54 0.55 0.57 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r  S/N alpha se  var.r med.r
## asp1       0.55      0.48    0.50     0.135 0.94   0.0082 0.0271 0.073
## asp2       0.57      0.50    0.51     0.144 1.01   0.0080 0.0258 0.097
## asp5       0.55      0.47    0.48     0.127 0.87   0.0082 0.0284 0.053
## asp6       0.57      0.53    0.53     0.159 1.13   0.0081 0.0225 0.119
## q_636      0.45      0.44    0.45     0.117 0.80   0.0096 0.0152 0.097
## q_637      0.40      0.39    0.40     0.097 0.65   0.0105 0.0136 0.073
## q_638      0.37      0.40    0.39     0.101 0.67   0.0119 0.0096 0.086
## 
##  Item statistics 
##          n raw.r std.r r.cor r.drop mean   sd
## asp1  3541 0.262  0.46  0.27  0.150 0.78 0.41
## asp2  3540 0.183  0.42  0.21  0.070 0.80 0.40
## asp5  3531 0.249  0.49  0.32  0.157 0.84 0.37
## asp6  3477 0.098  0.36  0.12  0.022 0.92 0.28
## q_636 3540 0.754  0.54  0.44  0.426 4.01 1.62
## q_637 3542 0.737  0.62  0.58  0.505 4.91 1.26
## q_638 3539 0.789  0.61  0.58  0.537 4.71 1.46
## 
## Non missing response frequency for each item
##          0    1    2    3    4    5    6 miss
## asp1  0.22 0.78 0.00 0.00 0.00 0.00 0.00 0.01
## asp2  0.20 0.80 0.00 0.00 0.00 0.00 0.00 0.01
## asp5  0.16 0.84 0.00 0.00 0.00 0.00 0.00 0.01
## asp6  0.08 0.92 0.00 0.00 0.00 0.00 0.00 0.02
## q_636 0.00 0.08 0.21 0.04 0.11 0.41 0.15 0.01
## q_637 0.00 0.03 0.06 0.02 0.08 0.44 0.35 0.01
## q_638 0.00 0.06 0.08 0.03 0.08 0.41 0.34 0.01

A reliability score of .55 is an improvement, but still does not reach the desired benchmark of .7.

asp3_fa <- dat %>%
  select(asp1:asp2,
         asp5:asp6,
         q_636:q_638) %>%
  fa(cor="mixed")
asp3_fa
## Factor Analysis using method =  minres
## Call: fa(r = ., cor = "mixed")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        MR1    h2   u2 com
## asp1  0.31 0.097 0.90   1
## asp2  0.24 0.056 0.94   1
## asp5  0.36 0.128 0.87   1
## asp6  0.17 0.028 0.97   1
## q_636 0.54 0.287 0.71   1
## q_637 0.66 0.436 0.56   1
## q_638 0.67 0.444 0.56   1
## 
##                 MR1
## SS loadings    1.47
## Proportion Var 0.21
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  21  and the objective function was  1.11 with Chi Square of  3958
## The degrees of freedom for the model are 14  and the objective function was  0.48 
## 
## The root mean square of the residuals (RMSR) is  0.13 
## The df corrected root mean square of the residuals is  0.16 
## 
## The harmonic number of observations is  3518 with the empirical chi square  2642  with prob <  0 
## The total number of observations was  3563  with Likelihood Chi Square =  1692  with prob <  0 
## 
## Tucker Lewis Index of factoring reliability =  0.36
## RMSEA index =  0.183  and the 90 % confidence intervals are  0.176 0.191
## BIC =  1578
## Fit based upon off diagonal values = 0.69
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.83
## Multiple R square of scores with factors          0.70
## Minimum correlation of possible factor scores     0.39

Let’s try just the three locus of control items.

dat %>%
  select(q_636:q_638) %>%
  fa.parallel(cor="poly")

## Parallel analysis suggests that the number of factors =  1  and the number of components =  1

One factor or component, that’s hopeful.

asp4_alpha <- dat %>%
  select(q_636:q_638) %>%
  alpha()

asp4_alpha
## 
## Reliability analysis   
## Call: alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N    ase mean  sd median_r
##       0.68      0.69     0.6      0.43 2.2 0.0091  4.5 1.1     0.43
## 
##  lower alpha upper     95% confidence boundaries
## 0.67 0.68 0.7 
## 
##  Reliability if an item is dropped:
##       raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## q_636      0.66      0.66    0.50      0.50 2.0    0.011    NA  0.50
## q_637      0.60      0.60    0.43      0.43 1.5    0.013    NA  0.43
## q_638      0.51      0.52    0.35      0.35 1.1    0.016    NA  0.35
## 
##  Item statistics 
##          n raw.r std.r r.cor r.drop mean  sd
## q_636 3540  0.79  0.76  0.54   0.46  4.0 1.6
## q_637 3542  0.75  0.78  0.61   0.50  4.9 1.3
## q_638 3539  0.82  0.82  0.68   0.56  4.7 1.5
## 
## Non missing response frequency for each item
##          1    2    3    4    5    6 miss
## q_636 0.08 0.21 0.04 0.11 0.41 0.15 0.01
## q_637 0.03 0.06 0.02 0.08 0.44 0.35 0.01
## q_638 0.06 0.08 0.03 0.08 0.41 0.34 0.01

Reliability of .68, that’s good enough for an index.

asp4_fa <- dat %>%
  select(q_636:q_638) %>%
  fa(cor="poly")

asp4_fa
## Factor Analysis using method =  minres
## Call: fa(r = ., cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        MR1   h2   u2 com
## q_636 0.60 0.35 0.65   1
## q_637 0.63 0.40 0.60   1
## q_638 0.79 0.62 0.38   1
## 
##                 MR1
## SS loadings    1.38
## Proportion Var 0.46
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  3  and the objective function was  0.57 with Chi Square of  2027
## The degrees of freedom for the model are 0  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic number of observations is  3538 with the empirical chi square  0  with prob <  NA 
## The total number of observations was  3563  with Likelihood Chi Square =  0  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.86
## Multiple R square of scores with factors          0.74
## Minimum correlation of possible factor scores     0.48
library(effectsize)
interpret_nnfi(asp4_fa)
## Error in UseMethod("interpret"): no applicable method for 'interpret' applied to an object of class "c('psych', 'fa')"

I’m not sure how to interpret the fit scores, but I think we can generate an index out of these three items. Let’s impute missing values and generate the index.

4.2 Impute missing values

locus <- dat %>%
  select(county,
         hhsize=household_members,
         q_636:q_638) %>%
  as.data.frame() %>%
  mutate(County = factor(county)) %>%
  select(-county)

locus[,1:4] <- lapply(locus[,1:4], as.numeric)
#str(locus)
locus_imp <- missForest(locus)
##   missForest iteration 1 in progress...done!
##   missForest iteration 2 in progress...done!
##   missForest iteration 3 in progress...done!
##   missForest iteration 4 in progress...done!
locus_imp$OOBerror
## NRMSE   PFC 
## 0.762 0.000
locus_out <- locus_imp$ximp

Compare means for raw and imputed variables

describe(locus)
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
13.55e+034.972.8654.782.97123220.707  1.51 0.048 
23.54e+034.011.6254.121.48165-0.557  -1.12 0.0272
33.54e+034.911.2655.171.48165-1.59   2.03 0.0211
43.54e+034.711.4654.961.48165-1.3    0.6520.0246
53.56e+033.481.7343.482.971650.00988-1.3  0.029 
describe(locus_out)
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
13.56e+034.972.8554.782.97123220.706  1.51 0.0478
23.56e+034.021.6154.121.48165-0.561  -1.11 0.027 
33.56e+034.911.2555.171.48165-1.59   2.06 0.021 
43.56e+034.711.4654.961.48165-1.31   0.6670.0244
53.56e+033.481.7343.482.971650.00988-1.3  0.029 
a <- locus %>%
  mutate(id=1:3563) %>%
  filter(is.na(hhsize) |
           is.na(q_636) |
           is.na(q_637) |
           is.na(q_638))

a_id <- a$id

a[1:6,]
hhsizeq_636q_637q_638Countyid
642Akobo107
566Duk325
4Duk342
666Duk376
756Duk425
666Duk510
b <- locus_out %>%
  mutate(id=1:3563) %>%
  filter(id %in% a_id)

b[1:6,]

hhsizeq_636q_637q_638Countyid
5.136   4   2   Akobo107
5   5.536   6   Duk325
4   4.915.115.3 Duk342
5.966   6   6   Duk376
7   5   6   5.53Duk425
5.966   6   6   Duk510
Now generate the index

loc_fa <- locus_out %>%
  select(2:4) %>%
  fa(cor="poly")
loc_fa
## Factor Analysis using method =  minres
## Call: fa(r = ., cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        MR1   h2   u2 com
## q_636 0.59 0.34 0.66   1
## q_637 0.63 0.39 0.61   1
## q_638 0.79 0.62 0.38   1
## 
##                 MR1
## SS loadings    1.36
## Proportion Var 0.45
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  3  and the objective function was  0.56 with Chi Square of  1977
## The degrees of freedom for the model are 0  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic number of observations is  3563 with the empirical chi square  0  with prob <  NA 
## The total number of observations was  3563  with Likelihood Chi Square =  0  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.86
## Multiple R square of scores with factors          0.74
## Minimum correlation of possible factor scores     0.48

Compare to the non-imputed loadings

asp4_fa
## Factor Analysis using method =  minres
## Call: fa(r = ., cor = "poly")
## Standardized loadings (pattern matrix) based upon correlation matrix
##        MR1   h2   u2 com
## q_636 0.60 0.35 0.65   1
## q_637 0.63 0.40 0.60   1
## q_638 0.79 0.62 0.38   1
## 
##                 MR1
## SS loadings    1.38
## Proportion Var 0.46
## 
## Mean item complexity =  1
## Test of the hypothesis that 1 factor is sufficient.
## 
## The degrees of freedom for the null model are  3  and the objective function was  0.57 with Chi Square of  2027
## The degrees of freedom for the model are 0  and the objective function was  0 
## 
## The root mean square of the residuals (RMSR) is  0 
## The df corrected root mean square of the residuals is  NA 
## 
## The harmonic number of observations is  3538 with the empirical chi square  0  with prob <  NA 
## The total number of observations was  3563  with Likelihood Chi Square =  0  with prob <  NA 
## 
## Tucker Lewis Index of factoring reliability =  -Inf
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy             
##                                                    MR1
## Correlation of (regression) scores with factors   0.86
## Multiple R square of scores with factors          0.74
## Minimum correlation of possible factor scores     0.48

The loading for q_636 goes down a tad, but otherwise ok

dat <- dat %>%
  mutate(Locus = loc_fa$scores)

describe(dat$Locus)
varsnmeansdmediantrimmedmadminmaxrangeskewkurtosisse
13.56e+03-2.51e-160.8650.2830.1250.557-2.71.033.73-1.281.10.0145
ggplot(dat, aes(Locus)) + 
  geom_density(size=1, color="darkblue")

4.3 Compare output

By county

loc_cnty <- dat %>%
  group_by(county) %>%
  summarize(loc = mean(Locus, na.rm=T),
            se = std.error(Locus),
            q638=mean(q_638, na.rm=T),
            se2 = std.error(q_638, na.rm=T)) %>%
  mutate(lower=loc-1.96*se,
         upper=loc+1.96*se,
         lower2 = q638-1.96*se2,
         upper2 = q638+1.96*se2)

loc_cnty
countylocseq638se2lowerupperlower2upper2
Akobo-0.157  0.039 4.290.0683-0.233 -0.08024.164.43
Budi-0.003180.03174.830.0478-0.06530.05894.744.93
Duk0.492  0.021 5.4 0.03530.45  0.533 5.335.47
Leer-0.209  0.03694.290.0712-0.281 -0.136 4.154.43
Pibor0.351  0.01555.240.02510.321 0.382 5.195.29
Uror-0.363  0.04144.390.0687-0.444 -0.281 4.254.52
ggplot(loc_cnty, aes(loc, fct_reorder(county, loc))) + 
  geom_vline(xintercept=0, size=1, alpha=.7, color="grey60") +
#  geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") + 
  geom_point(color="dodgerblue", size=3) + 
  geom_errorbar(aes(xmin=lower, xmax=upper),
                color="dodgerblue",
                width=0,
                size=1.1) +
  scale_x_continuous(limits=c(-1,1),
                     breaks=seq(-1,1,.5)) +
  labs(x="Locus of control index",
       y="")

ggplot(loc_cnty, aes(q638, fct_reorder(county, q638))) + 
  geom_vline(xintercept=mean(dat$q_638, na.rm=T), size=1, alpha=.7, color="darkgoldenrod2") +
  #geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") + 
  geom_point(color="dodgerblue", size=3) + 
  geom_errorbar(aes(xmin=lower2, xmax=upper2),
                color="dodgerblue",
                width=0,
                size=1.1) +
  scale_x_continuous(limits=c(4,6),
                     breaks=4:6) +
  labs(x="My life is determined by my own actions",
       y="")

Weighted values by county

loc_cnty_wt <- svyrdat %>%
  group_by(county) %>%
  summarise(loc = survey_mean(Locus, na.rm=T)) %>%
  mutate(lower=loc-1.96*loc_se,
         upper=loc+1.96*loc_se)

loc_cnty_wt
countylocloc_selowerupper
Akobo-0.166 0.102 -0.3670.0349  
Budi-0.01520.135 -0.28 0.25    
Duk0.541 0.08370.3770.705   
Leer-0.144 0.0737-0.2880.000862
Pibor0.36  0.02450.3120.408   
Uror-0.299 0.117 -0.528-0.0708  
ggplot(loc_cnty_wt, aes(loc, fct_reorder(county, loc))) + 
  geom_vline(xintercept=0, size=1, alpha=.7, color="grey60") +
#  geom_vline(xintercept=4, size=1, alpha=.7, color="aquamarine3") + 
  geom_point(color="dodgerblue", size=3) + 
  geom_errorbar(aes(xmin=lower, xmax=upper),
                color="dodgerblue",
                width=0,
                size=1.1) +
  scale_x_continuous(limits=c(-1,1),
                     breaks=seq(-1,1,.5)) +
  labs(x="Locus of control index",
       y="")
## Error in ggplot(loc_cnty_wt, aes(loc, fct_reorder(county, loc))): object 'loc_cnty_wt' not found

5 Concluding thoughts

  • Aspirations index items don’t play well together
  • Reverse coded items confounded respondents
  • Locus of control sub-index did work and can be used instead of the larger aspirations index